Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
JCO Clin Cancer Inform ; 8: e2300207, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38427922

RESUMO

PURPOSE: Although immune checkpoint inhibitors (ICIs) have improved outcomes in certain patients with cancer, they can also cause life-threatening immunotoxicities. Predicting immunotoxicity risks alongside response could provide a personalized risk-benefit profile, inform therapeutic decision making, and improve clinical trial cohort selection. We aimed to build a machine learning (ML) framework using routine electronic health record (EHR) data to predict hepatitis, colitis, pneumonitis, and 1-year overall survival. METHODS: Real-world EHR data of more than 2,200 patients treated with ICI through December 31, 2018, were used to develop predictive models. Using a prediction time point of ICI initiation, a 1-year prediction time window was applied to create binary labels for the four outcomes for each patient. Feature engineering involved aggregating laboratory measurements over appropriate time windows (60-365 days). Patients were randomly partitioned into training (80%) and test (20%) sets. Random forest classifiers were developed using a rigorous model development framework. RESULTS: The patient cohort had a median age of 63 years and was 61.8% male. Patients predominantly had melanoma (37.8%), lung cancer (27.3%), or genitourinary cancer (16.4%). They were treated with PD-1 (60.4%), PD-L1 (9.0%), and CTLA-4 (19.7%) ICIs. Our models demonstrate reasonably strong performance, with AUCs of 0.739, 0.729, 0.755, and 0.752 for the pneumonitis, hepatitis, colitis, and 1-year overall survival models, respectively. Each model relies on an outcome-specific feature set, though some features are shared among models. CONCLUSION: To our knowledge, this is the first ML solution that assesses individual ICI risk-benefit profiles based predominantly on routine structured EHR data. As such, use of our ML solution will not require additional data collection or documentation in the clinic.


Assuntos
Colite , Hepatite , Pneumonia , Humanos , Masculino , Pessoa de Meia-Idade , Feminino , Inibidores de Checkpoint Imunológico , Instituições de Assistência Ambulatorial , Pneumonia/induzido quimicamente , Pneumonia/diagnóstico
3.
JAMIA Open ; 6(1): ooad017, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37012912

RESUMO

Objective: Automatically identifying patients at risk of immune checkpoint inhibitor (ICI)-induced colitis allows physicians to improve patientcare. However, predictive models require training data curated from electronic health records (EHR). Our objective is to automatically identify notes documenting ICI-colitis cases to accelerate data curation. Materials and Methods: We present a data pipeline to automatically identify ICI-colitis from EHR notes, accelerating chart review. The pipeline relies on BERT, a state-of-the-art natural language processing (NLP) model. The first stage of the pipeline segments long notes using keywords identified through a logistic classifier and applies BERT to identify ICI-colitis notes. The next stage uses a second BERT model tuned to identify false positive notes and remove notes that were likely positive for mentioning colitis as a side-effect. The final stage further accelerates curation by highlighting the colitis-relevant portions of notes. Specifically, we use BERT's attention scores to find high-density regions describing colitis. Results: The overall pipeline identified colitis notes with 84% precision and reduced the curator note review load by 75%. The segment BERT classifier had a high recall of 0.98, which is crucial to identify the low incidence (<10%) of colitis. Discussion: Curation from EHR notes is a burdensome task, especially when the curation topic is complicated. Methods described in this work are not only useful for ICI colitis but can also be adapted for other domains. Conclusion: Our extraction pipeline reduces manual note review load and makes EHR data more accessible for research.

4.
AMIA Annu Symp Proc ; 2022: 884-891, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37128469

RESUMO

Data curation is a bottleneck for many informatics pipelines. A specific example of this is aggregating data from preclinical studies to identify novel genetic pathways for atherosclerosis in humans. This requires extracting data from published mouse studies such as the perturbed gene and its impact on lesion sizes and plaque inflammation, which is non-trivial. Curation efforts are resource-heavy, with curators manually extracting data from hundreds of publications. In this work, we describe the development of a semi-automated curation tool to accelerate data extraction. We use natural language processing (NLP) methods to auto-populate a web-based form which is then reviewed by a curator. We conducted a controlled user study to evaluate the curation tool. Our NLP model has a 70% accuracy on categorical fields and our curation tool accelerates task completion time by 49% compared to manual curation.


Assuntos
Curadoria de Dados , Processamento de Linguagem Natural , Humanos , Animais , Camundongos , Curadoria de Dados/métodos , Publicações
5.
Arterioscler Thromb Vasc Biol ; 42(1): 35-48, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34758633

RESUMO

OBJECTIVE: Animal models of atherosclerosis are used extensively to interrogate molecular mechanisms in serial fashion. We tested whether a novel systems biology approach to integration of preclinical data identifies novel pathways and regulators in human disease. Approach and Results: Of 716 articles published in ATVB from 1995 to 2019 using the apolipoprotein E knockout mouse to study atherosclerosis, data were extracted from 360 unique studies in which a gene was experimentally perturbed to impact plaque size or composition and analyzed using Ingenuity Pathway Analysis software. TREM1 (triggering receptor expressed on myeloid cells) signaling and LXR/RXR (liver X receptor/retinoid X receptor) activation were identified as the top atherosclerosis-associated pathways in mice (both P<1.93×10-4, TREM1 implicated early and LXR/RXR in late atherogenesis). The top upstream regulatory network in mice (sc-58125, a COX2 inhibitor) linked 64.0% of the genes into a single network. The pathways and networks identified in mice were interrogated by testing for associations between the genetically predicted gene expression of each mouse pathway-identified human homolog with clinical atherosclerosis in a cohort of 88 660 human subjects. Homologous human pathways and networks were significantly enriched for gene-atherosclerosis associations (empirical P<0.01 for TREM1 and LXR/RXR pathways and COX2 network). This included 12(60.0%) TREM1 pathway genes, 15(53.6%) LXR/RXR pathway genes, and 67(49.3%) COX2 network genes. Mouse analyses predicted, and human study validated, the strong association of COX2 expression (PTGS2) with increased likelihood of atherosclerosis (odds ratio, 1.68 per SD of genetically predicted gene expression; P=1.07×10-6). CONCLUSIONS: PRESCIANT (Preclinical Science Integration and Translation) leverages published preclinical investigations to identify high-confidence pathways, networks, and regulators of human disease.


Assuntos
Apolipoproteínas E/genética , Aterosclerose/genética , Redes Reguladoras de Genes , Biologia de Sistemas , Adulto , Idoso , Animais , Apolipoproteínas E/deficiência , Aterosclerose/metabolismo , Aterosclerose/patologia , Modelos Animais de Doenças , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Camundongos Knockout para ApoE , Pessoa de Meia-Idade , Fenótipo , Placa Aterosclerótica , Medição de Risco , Fatores de Risco , Fatores Sexuais , Especificidade da Espécie
6.
JMIR Med Inform ; 8(11): e19612, 2020 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-33151150

RESUMO

Digitization of health records has allowed the health care domain to adopt data-driven algorithms for decision support. There are multiple people involved in this process: a data engineer who processes and restructures the data, a data scientist who develops statistical models, and a domain expert who informs the design of the data pipeline and consumes its results for decision support. Although there are multiple data interaction tools for data scientists, few exist to allow domain experts to interact with data meaningfully. Designing systems for domain experts requires careful thought because they have different needs and characteristics from other end users. There should be an increased emphasis on the system to optimize the experts' interaction by directing them to high-impact data tasks and reducing the total task completion time. We refer to this optimization as amplifying domain expertise. Although there is active research in making machine learning models more explainable and usable, it focuses on the final outputs of the model. However, in the clinical domain, expert involvement is needed at every pipeline step: curation, cleaning, and analysis. To this end, we review literature from the database, human-computer information, and visualization communities to demonstrate the challenges and solutions at each of the data pipeline stages. Next, we present a taxonomy of expertise amplification, which can be applied when building systems for domain experts. This includes summarization, guidance, interaction, and acceleration. Finally, we demonstrate the use of our taxonomy with a case study.

7.
Artigo em Inglês | MEDLINE | ID: mdl-32312778

RESUMO

Empiric antibiotic prescribing can be supported by guidelines and/or local antibiograms, but these have limitations. We sought to use data from a comprehensive electronic health record to use statistical learning to develop predictive models for individual antibiotics that incorporate patient- and hospital-specific factors. This paper reports on the development and validation of these models with a large retrospective cohort. This was a retrospective cohort study including hospitalized patients with positive urine cultures in the first 48 h of hospitalization at a 1,500-bed tertiary-care hospital over a 4.5-year period. All first urine cultures with susceptibilities were included. Statistical learning techniques, including penalized logistic regression, were used to create predictive models for cefazolin, ceftriaxone, ciprofloxacin, cefepime, and piperacillin-tazobactam. These were validated on a held-out cohort. The final data set used for analysis included 6,366 patients. Final model covariates included demographics, comorbidity score, recent antibiotic use, recent antimicrobial resistance, and antibiotic allergies. Models had acceptable to good discrimination in the training data set and acceptable performance in the validation data set, with a point estimate for area under the receiver operating characteristic curve (AUC) that ranged from 0.65 for ceftriaxone to 0.69 for cefazolin. All models had excellent calibration. We used electronic health record data to create predictive models to estimate antibiotic susceptibilities for urinary tract infections in hospitalized patients. Our models had acceptable performance in a held-out validation cohort.


Assuntos
Infecções Urinárias , Antibacterianos/uso terapêutico , Hospitais , Humanos , Testes de Sensibilidade Microbiana , Estudos Retrospectivos , Infecções Urinárias/tratamento farmacológico
9.
Proceedings VLDB Endowment ; 11(13): 2263-2276, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31179156

RESUMO

An important step in data preparation involves dealing with incomplete datasets. In some cases, the missing values are unreported because they are characteristics of the domain and are known by practitioners. Due to this nature of the missing values, imputation and inference methods do not work and input from domain experts is required. A common method for experts to fill missing values is through rules. However, for large datasets with thousands of missing data points, it is laborious and time consuming for a user to make sense of the data and formulate effective completion rules. Thus, users need to be shown subsets of the data that will have the most impact in completing missing fields. Further, these subsets should provide the user with enough information to make an update. Choosing subsets that maximize the probability of filling in missing data from a large dataset is computationally expensive. To address these challenges, we present ICARUS, which uses a heuristic algorithm to show the user small subsets of the database in the form of a matrix. This allows the user to iteratively fill in data by applying suggested rules based on their direct edits to the matrix. The suggested rules amplify the users' input to multiple missing fields by using the database schema to infer hierarchies. Simulations show ICARUS has an average improvement of 50% across three datasets over the baseline system. Further, in-person user studies demonstrate that naive users can fill in 68% of missing data within an hour, while manual rule specification spans weeks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...